RECOME: A new density-based clustering algorithm using relative KNN kernel density

نویسندگان

  • Yangli-ao Geng
  • Qingyong Li
  • Rong Zheng
  • Fuzhen Zhuang
  • Ruisi He
  • Naixue Xiong
چکیده

Discovering clusters from a dataset with different shapes, density, and scales is a known challenging problem in data clustering. In this paper, we propose the RElative COre MErge (RECOME) clustering algorithm. The core of RECOME is a novel density measure, i.e., Relative K nearest Neighbor Kernel Density (RNKD). RECOME identifies core objects with unit RNKD, and partitions non-core objects into atom clusters by successively following higher-density neighbor relations toward core objects. Core objects and their corresponding atom clusters are then merged through α-reachable paths on a KNN graph. Furthermore, we discover that the number of clusters computed by RECOME is a step function of the α parameter with jump discontinuity on a small collection of values. A jump discontinuity discovery (JDD) method is proposed using a variant of the Dijkstra’s algorithm. RECOME is evaluated on three synthetic datasets and six real datasets. Experimental results indicate that RECOME is able to discover clusters with different shapes, density and scales. It achieves better clustering results than established density-based clustering methods on real datasets. Moreover, JDD is shown to be effective to extract the jump discontinuity set of parameter α for all tested dataset, which can ease the task of data exploration and parameter tuning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KNN-kernel density-based clustering for high-dimensional multivariate data

Density-based clustering algorithms for multivariate data often have difficulties with high-dimensional data and clusters of very different densities.A new density-based clustering algorithm, called KNNCLUST, is presented in this paper that is able to tackle these situations. It is based on the combination of nonparametric k-nearest-neighbor (KNN) and kernel (KNN-kernel) density estimation. The...

متن کامل

KNN Kernel Shift Clustering with Highly Effective Memory Usage

This paper presents a novel clustering algorithm with highly effective memory usage. The algorithm, called kNN kernel shift, classifies samples based on underlying probability density function. In clustering algorithms based on density, a local mode of the density represents a cluster center. It is effective to shift each sample to a point having higher density, considering the density gradient...

متن کامل

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

The Relative Improvement of Bias Reduction in Density Estimator Using Geometric Extrapolated Kernel

One of a nonparametric procedures used to estimate densities is kernel method. In this paper, in order to reduce bias of  kernel density estimation, methods such as usual kernel(UK), geometric extrapolation usual kernel(GEUK), a bias reduction kernel(BRK) and a geometric extrapolation bias reduction kernel(GEBRK) are introduced. Theoretical properties, including the selection of smoothness para...

متن کامل

Density-Based Clustering Validation

One of the most challenging aspects of clustering is validation, which is the objective and quantitative assessment of clustering results. A number of different relative validity criteria have been proposed for the validation of globular, clusters. Not all data, however, are composed of globular clusters. Density-based clustering algorithms seek partitions with high density areas of points (clu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 436-437  شماره 

صفحات  -

تاریخ انتشار 2018